Shift of pairwise similarities for data clustering
نویسندگان
چکیده
Abstract Several clustering methods (e.g., Normalized Cut and Ratio ) divide the Min cost function by a cluster dependent factor size or degree of clusters), in order to yield more balanced partitioning. We, instead, investigate adding such regularizations original function. We first consider case where regularization term is sum squared clusters, then generalize it adaptive pairwise similarities. This leads shifting (adaptively) similarities which might make some them negative. study connection this method Correlation Clustering propose an efficient local search optimization algorithm with fast theoretical convergence rate solve new problem. In following, we shift on common methods, finally, demonstrate superior performance extensive experiments different datasets.
منابع مشابه
Cost functions for pairwise data clustering
Cost functions for non-hierarchical pairwise clustering are introduced, in the probabilistic autoencoder framework, by the request of maximal average similarity between input and the output of the autoencoder. Clustering is thus formulated as the problem of finding the ground state of Potts spins Hamiltonians. The partition, provided by this procedure, identifies clusters with dense connected r...
متن کاملPairwise Data Clustering by Deterministic Annealing
Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness pro...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملContext clustering for Word Sense Disambiguation based on modeling pairwise context similarities
Traditionally, word sense disambiguation (WSD) involves a different context model for each individual word. This paper presents a new approach to WSD using weakly supervised learning. Statistical models are not trained for the contexts of each individual word, but for the similarities between context pairs at category level. The insight is that the correlation regularity between the sense disti...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2022
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-022-06189-6